AITopics | parameter overhead

Collaborating Authors

parameter overhead

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Compositional Subspace Representation Fine-tuning for Adaptive Large Language Models

Zhou, Andy

arXiv.org Artificial IntelligenceMar-16-2025

Adapting large language models to multiple tasks can cause cross-skill interference, where improvements for one skill degrade another. While methods such as LoRA impose orthogonality constraints at the weight level, they do not fully address interference in hidden-state representations. We propose Compositional Subspace Representation Fine-tuning (CS-ReFT), a novel representation-based approach that learns multiple orthonormal subspace transformations, each specializing in a distinct skill, and composes them via a lightweight router. By isolating these subspace edits in the hidden state, rather than weight matrices, CS-ReFT prevents cross-task conflicts more effectively. On the AlpacaEval benchmark, applying CS-ReFT to Llama-2-7B achieves a 93.94% win rate, surpassing GPT-3.5 Turbo (86.30%) while requiring only 0.0098% of model parameters. These findings show that specialized representation edits, composed via a simple router, significantly enhance multi-task instruction following with minimal overhead.

large language model, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2503.10617

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)

Add feedback

Hierarchical Recurrent Adapters for Efficient Multi-Task Adaptation of Large Speech Models

Munkhdalai, Tsendsuren, Chen, Youzheng, Sim, Khe Chai, Biadsy, Fadi, Sainath, Tara, Mengibar, Pedro Moreno

arXiv.org Artificial IntelligenceMar-25-2024

Parameter efficient adaptation methods have become a key mechanism to train large pre-trained models for downstream tasks. However, their per-task parameter overhead is considered still high when the number of downstream tasks to adapt for is large. We introduce an adapter module that has a better efficiency in large scale multi-task adaptation scenario. Our adapter is hierarchical in terms of how the adapter parameters are allocated. The adapter consists of a single shared controller network and multiple task-level adapter heads to reduce the per-task parameter overhead without performance regression on downstream tasks. The adapter is also recurrent so the entire adapter parameters are reused across different layers of the pre-trained model. Our Hierarchical Recurrent Adapter (HRA) outperforms the previous adapter-based approaches as well as full model fine-tuning baseline in both single and multi-task adaptation settings when evaluated on automatic speech recognition tasks.

adapter, adapter head, hra, (14 more...)

arXiv.org Artificial Intelligence

2403.19709

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Asia > China > Guangxi Province > Nanning (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Speech (0.70)

Add feedback

Energy-efficient Task Adaptation for NLP Edge Inference Leveraging Heterogeneous Memory Architectures

Fu, Zirui, Avaliani, Aleksandre, Donato, Marco

arXiv.org Artificial IntelligenceApr-12-2023

Executing machine learning inference tasks on resource-constrained edge devices requires careful hardware-software co-design optimizations. Recent examples have shown how transformer-based deep neural network models such as ALBERT can be used to enable the execution of natural language processing (NLP) inference on mobile systems-on-chip housing custom hardware accelerators. However, while these existing solutions are effective in alleviating the latency, energy, and area costs of running single NLP tasks, achieving multi-task inference requires running computations over multiple variants of the model parameters, which are tailored to each of the targeted tasks. This approach leads to either prohibitive on-chip memory requirements or paying the cost of off-chip memory access. This paper proposes adapter-ALBERT, an efficient model optimization for maximal data reuse across different tasks. The proposed model's performance and robustness to data compression methods are evaluated across several language tasks from the GLUE benchmark. Additionally, we demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator to extrapolate performance, power, and area improvements over the execution of a traditional ALBERT model on the same hardware platform.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2303.161

Country:

North America > United States > Ohio > Franklin County > Columbus (0.04)
North America > United States > Massachusetts > Middlesex County > Medford (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
(7 more...)

Genre: Research Report (0.40)

Industry: Semiconductors & Electronics (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Task Difficulty Aware Parameter Allocation & Regularization for Lifelong Learning

Wang, Wenjin, Hu, Yunqing, Chen, Qianglong, Zhang, Yin

arXiv.org Artificial IntelligenceApr-11-2023

Parameter regularization or allocation methods are effective in overcoming catastrophic forgetting in lifelong learning. However, they solve all tasks in a sequence uniformly and ignore the differences in the learning difficulty of different tasks. So parameter regularization methods face significant forgetting when learning a new task very different from learned tasks, and parameter allocation methods face unnecessary parameter overhead when learning simple tasks. In this paper, we propose the Parameter Allocation & Regularization (PAR), which adaptively select an appropriate strategy for each task from parameter allocation and regularization based on its learning difficulty. A task is easy for a model that has learned tasks related to it and vice versa. We propose a divergence estimation method based on the Nearest-Prototype distance to measure the task relatedness using only features of the new task. Moreover, we propose a time-efficient relatedness-aware sampling-based architecture search strategy to reduce the parameter overhead for allocation. Experimental results on multiple benchmarks demonstrate that, compared with SOTAs, our method is scalable and significantly reduces the model's redundancy while improving the model's performance. Further qualitative analysis indicates that PAR obtains reasonable task-relatedness.

artificial intelligence, learning, machine learning, (11 more...)

arXiv.org Artificial Intelligence

2304.05288

Country:

North America > United States > Nevada > Clark County > Las Vegas (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)

Genre:

Research Report (0.64)
Instructional Material (0.62)

Industry: Education > Educational Setting > Continuing Education (0.62)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

Ensemble Model Patching: A Parameter-Efficient Variational Bayesian Neural Network

Chang, Oscar, Yao, Yuling, Williams-King, David, Lipson, Hod

arXiv.org Machine LearningMay-22-2019

Two main obstacles preventing the widespread adoption of variational Bayesian neural networks are the high parameter overhead that makes them infeasible on large networks, and the difficulty of implementation, which can be thought of as "programming overhead." MC dropout [Gal and Ghahramani, 2016] is popular because it sidesteps these obstacles. Nevertheless, dropout is often harmful to model performance when used in networks with batch normalization layers [Li et al., 2018], which are an indispensable part of modern neural networks. We construct a general variational family for ensemble-based Bayesian neural networks that encompasses dropout as a special case. We further present two specific members of this family that work well with batch normalization layers, while retaining the benefits of low parameter and programming overhead, comparable to non-Bayesian training. Our proposed methods improve predictive accuracy and achieve almost perfect calibration on a ResNet-18 trained with ImageNet.

arxiv preprint arxiv, neural network, parameter overhead, (14 more...)

arXiv.org Machine Learning

1905.09453

Country:

North America > United States (0.14)
North America > Canada > Ontario > Toronto (0.14)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)

Add feedback